Extended many-item similarity indices for sets of nucleotide and protein sequences
نویسندگان
چکیده
Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that ubiquitously present in bioinformatics workflows, and usually accomplished by pairwise comparisons sequences, utilizing simple ( e.g. percent identity) more intricate concepts substitution scoring matrices). Complex tasks (such as clustering) rely on large number under the hood, instead direct quantification set similarities. Based our recently introduced framework enables multiple binary molecular fingerprints i.e. , calculation similarity fingerprint sets), here we introduce novel symmetric indices for analogous calculations sets character with than two t ) possible items = 4, 20). The features these new are studied detail analysis variance (ANOVA), demonstrated three case studies protein/DNA varying degrees (or evolutionary proximity). Python code extended many-item publicly available at: https://github.com/ramirandaq/tn_Comparisons .
منابع مشابه
New distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کاملSOME SIMILARITY MEASURES FOR PICTURE FUZZY SETS AND THEIR APPLICATIONS
In this work, we shall present some novel process to measure the similarity between picture fuzzy sets. Firstly, we adopt the concept of intuitionistic fuzzy sets, interval-valued intuitionistic fuzzy sets and picture fuzzy sets. Secondly, we develop some similarity measures between picture fuzzy sets, such as, cosine similarity measure, weighted cosine similarity measure, set-theoretic similar...
متن کاملSimilarity of Event Sequences Extended Abstract
Sequences of events are an important form of data that occurs in many application domains, such as telecommunications , biostatistics, user interface design, etc. We present a simple model for measuring the similarity of event sequences, and show that the resulting measure of distance can be eeciently computed using a form of dynamic programming.
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملClustering Item Data Sets with Association-Taxonomy Similarity
We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and sparsity. In view of the features of item data, we devise in this paper a novel measurement, called the associationtaxonomy similarity, and utilize this measurement to perform the clustering. With this association-taxo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational and structural biotechnology journal
سال: 2021
ISSN: ['2001-0370']
DOI: https://doi.org/10.1016/j.csbj.2021.06.021